Haplotype Inference Via Hierarchical Genotype Parsing

نویسندگان

  • Pasi Rastas
  • Esko Ukkonen
چکیده

The within-species genetic variation due to recombinations leads to a mosaic-like structure of DNA. This structure can be modeled, e.g. by parsing sample sequences of current DNA with respect to a small number of founders. The founders represent the ancestral sequence material from which the sample was created in a sequence of recombination steps. This scenario has recently been successfully applied on developing probabilistic Hidden Markov Methods for haplotyping genotypic data. In this paper we introduce a combinatorial method for haplotyping that is based on a similar parsing idea. We formulate a polynomial-time parsing algorithm that finds minimum cross-over parse in a simplified ‘flat’ parsing model that ignores the historical hierarchy of recombinations. The problem of constructing optimal founders that would give minimum possible parse for given genotypic sequences is shown NP-hard. A heuristic locally-optimal algorithm is given for founder construction. Combined with flat parsing this already gives quite good haplotyping results. Improved haplotyping is obtained by using a hierarchical parsing that properly models the natural recombination process. For finding short hierarchical parses a greedy polynomial-time algorithm is given. Empirical haplotyping results on HapMap data are reported.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A coalescence-guided hierarchical Bayesian method for haplotype inference.

Haplotype inference from phase-ambiguous multilocus genotype data is an important task for both disease-gene mapping and studies of human evolution. We report a novel haplotype-inference method based on a coalescence-guided hierarchical Bayes model. In this model, a hierarchical structure is imposed on the prior haplotype frequency distributions to capture the similarities among modern-day hapl...

متن کامل

A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data

Uncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far –including those based on approximate coalescence, finite mixtures, and maximal parsimony– often bypass issues such as unknown complexity of haplotype-space and demographic structures underlying...

متن کامل

یک مدل ریاضی جدید برای مساله استنباط هاپلوتایپ‌ها از ژنوتایپ‌ها با معیار پارسیمونی

The haplotype inference is one of the most important issues in the field of bioinformatics. It is because of its various applications in the diagnosis and treatment of inherited diseases such as diabetes, Alzheimer's and heart disease, which has provided a competition for researchers in presentation of mathematical models and design of algorithms to solve this problem. Despite the existence of ...

متن کامل

A Hierarchical Dirichlet Process Mixture Model for Haplotype Reconstruction from Multi-population Data1 by Kyung-ah Sohn

The perennial problem of “how many clusters?” remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and open-ended. This problem gets further complicated in a co-clustering scenario in which one needs to solve multipl...

متن کامل

Haplotype Inference by Pure Parsimony via Genetic Algorithm

Haplotypes are specially important in the study of complex diseases since they contain more information about gene alleles than genotype data. However, getting haplotype data via experiments methods is techniquely difficult and expensive. Thus, haplotype inference through computational methods is practical and attractive. There are several models for inferrings haplotype from population genotyp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007